Search CORE

16 research outputs found

Overfitting in Synthesis: Theory and Practice (Extended Version)

Author: Millstein Todd
Nori Aditya
Padhi Saswat
Sharma Rahul
Publication venue
Publication date: 01/01/2019
Field of study

In syntax-guided synthesis (SyGuS), a synthesizer's goal is to automatically generate a program belonging to a grammar of possible implementations that meets a logical specification. We investigate a common limitation across state-of-the-art SyGuS tools that perform counterexample-guided inductive synthesis (CEGIS). We empirically observe that as the expressiveness of the provided grammar increases, the performance of these tools degrades significantly. We claim that this degradation is not only due to a larger search space, but also due to overfitting. We formally define this phenomenon and prove no-free-lunch theorems for SyGuS, which reveal a fundamental tradeoff between synthesizer performance and grammar expressiveness. A standard approach to mitigate overfitting in machine learning is to run multiple learners with varying expressiveness in parallel. We demonstrate that this insight can immediately benefit existing SyGuS tools. We also propose a novel single-threaded technique called hybrid enumeration that interleaves different grammars and outperforms the winner of the 2018 SyGuS competition (Inv track), solving more problems and achieving a

5\times

mean speedup.Comment: 24 pages (5 pages of appendices), 7 figures, includes proofs of theorem

arXiv.org e-Print Archive

eScholarship - University of California

FlashProfile: A Framework for Synthesizing Data Profiles

Author: Gulwani Sumit
Jain Prateek
Millstein Todd
Padhi Saswat
Perelman Daniel
Polozov Oleksandr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across

153

tasks over

75

large real datasets, we observe a median profiling time of only

\sim\,0.7\,

s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

arXiv.org e-Print Archive

eScholarship - University of California

Learning Nonlinear Loop Invariants with Gated Continuous Logic Networks (Extended Version)

Author: Amizadeh Saeed
Bahdanau Dzmitry
Farzan Azadeh
Hájek Petr
Ioffe Sergey
Kimmig Angelika
Kincaid Zachary
Padhi Saswat
Payani Ali
Ryan Gabriel
Selsam Daniel
Srivastava Nitish
Yao Jianan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2020
Field of study

Verifying real-world programs often requires inferring loop invariants with nonlinear constraints. This is especially true in programs that perform many numerical operations, such as control systems for avionics or industrial plants. Recently, data-driven methods for loop invariant inference have shown promise, especially on linear invariants. However, applying data-driven inference to nonlinear loop invariants is challenging due to the large numbers of and magnitudes of high-order terms, the potential for overfitting on a small number of samples, and the large space of possible inequality bounds. In this paper, we introduce a new neural architecture for general SMT learning, the Gated Continuous Logic Network (G-CLN), and apply it to nonlinear loop invariant learning. G-CLNs extend the Continuous Logic Network (CLN) architecture with gating units and dropout, which allow the model to robustly learn general invariants over large numbers of terms. To address overfitting that arises from finite program sampling, we introduce fractional sampling---a sound relaxation of loop semantics to continuous functions that facilitates unbounded sampling on real domain. We additionally design a new CLN activation function, the Piecewise Biased Quadratic Unit (PBQU), for naturally learning tight inequality bounds. We incorporate these methods into a nonlinear loop invariant inference system that can learn general nonlinear loop invariants. We evaluate our system on a benchmark of nonlinear loop invariants and show it solves 26 out of 27 problems, 3 more than prior work, with an average runtime of 53.3 seconds. We further demonstrate the generic learning ability of G-CLNs by solving all 124 problems in the linear Code2Inv benchmark. We also perform a quantitative stability evaluation and show G-CLNs have a convergence rate of

97.5\%

on quadratic problems, a

39.2\%

improvement over CLN models

arXiv.org e-Print Archive

Crossref

Invariant Synthesis for Incomplete Verification Engines

Author: A Albargouthi
A Gupta
A Karbyshev
AR Bradley
C Calcagno
C Flanagan
C Löding
CAR Hoare
Duc-Hiep Chu
E Cohen
Haohui Mai
He Zhu
Isil Dillig
KL McMillan
KRM Leino
L Moura de
L Moura de
M Barnett
MA Colón
MJ Kearns
P Garg
P Garg
Pranav Garg
QL Le
R Piskac
R Sharma
R Sharma
R Sharma
R Sharma
S Itzhaky
S Itzhaky
Saswat Padhi
Sumit Gulwani
Thomas Ball
WN Chin
Xiaokang Qiu
Y Ge
Publication venue
Publication date: 12/01/2018
Field of study

We propose a framework for synthesizing inductive invariants for incomplete verification engines, which soundly reduce logical problems in undecidable theories to decidable theories. Our framework is based on the counter-example guided inductive synthesis principle (CEGIS) and allows verification engines to communicate non-provability information to guide invariant synthesis. We show precisely how the verification engine can compute such non-provability information and how to build effective learning algorithms when invariants are expressed as Boolean combinations of a fixed set of predicates. Moreover, we evaluate our framework in two verification settings, one in which verification engines need to handle quantified formulas and one in which verification engines have to reason about heap properties expressed in an expressive but undecidable separation logic. Our experiments show that our invariant synthesis framework based on non-provability information can both effectively synthesize inductive invariants and adequately strengthen contracts across a large suite of programs

arXiv.org e-Print Archive

Crossref

FlashProfile: a framework for synthesizing data profiles

Author: Padhi Saswat,
Publication venue
Publication date: 22/01/2020
Field of study

Ezid

Recommended from our members

Data-Driven Learning of Invariants and Specifications

Author: Padhi Saswat
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Although the program verification community has developed several techniques for analyzing software and formally proving their correctness, these techniques are too sophisticated for end users and require significant investment in terms of time and effort. In this dissertation, I present techniques that help programmers easily formalize the initial requirements for verifying their programs — specifications and inductive invariants. The proposed techniques leverage ideas from program synthesis and statistical learning to automatically generate these formal requirements from readily available program-related data, such as test cases, execution traces etc. I detail three of these data-driven learning techniques – FlashProfile and PIE for specification learning, and LoopInvGen for invariant learning.I conclude with some principles for building robust synthesis engines, which I learned while refining the aforementioned techniques. Since program synthesis is a form of function learning, it is perhaps unsurprising that some of the fundamental issues in program synthesis have also been explored in the machine learning community. I study one particular phenomenon — overfitting. I present a formalization of overfitting in program synthesis, and discuss two mitigation strategies, inspired by existing techniques

eScholarship - University of California